Demand forecasting for companies with many branches, low sales numbers per 

product, and non-recurring orderings 
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Abstract 

We propose the new Top-Dog-Index to quantify the his- 
toric deviation of the supply data of many small branches 
for a commodity group from sales data. On the one hand, 
the common parametric assumptions on the customer de- 
mand distribution in the literature could not at all be sup- 
ported in our real-world data set. On the other hand, a 
reasonably-looking non-parametric approach to estimate 
the demand distribution for the different branches directly 
from the sales distribution could only provide us with statis- 
tically weak and unreliable estimates for the future demand. 
Based on real-world sales data from our industry partner 
we provide evidence that our Top-Dog-Index is statistically 
robust. Using the Top-Dog-Index, we propose a heuristics 
to improve the branch-dependent proportion between sup- 
ply and demand. Our approach cannot estimate tlie branch- 
dependent demand directly. It can, however, classify the 
branches into a given number of clusters according to an 
historic oversupply or undersupply. This classification of 
branches can iteratively be used to adapt the branch distri- 
bution of supply and demand in the future. 



1. Introduction 

Many retailers have to deal in their daily businesses with 
small profit margins. Their economic success lies mostly in 
the ability to forecast the customers' demand for individual 
products. More specifically: trade exactly what you can 
sell to your customers. This task has two aspects if your 
company has many branches in different regions: trade what 
your customers would like to buy because the product as 
such is attractive to them and provide a demand adjusted 
number of items for each branch or region. 

In this paper we deal with the second aspect only: meet 
the branch distributed demand for products as closely as 



possible. The first aspect clearly also interferes with the 
total demand for a product over all branches. Therfore, we 
assume that we are given a fix total number of items per 
product which should be distributed over the set of branches 
to meet the the branch-dependent demand distribution as 
closely as possible. 

Our industry partner is a fashion discounter with more 
than 1 000 branches most of whose products are never re- 
plenished, except for the very few "never-out-of-stock"- 
products (NOS products): because of lead times of around 
three months, apparel replenishments would be too late any- 
way. In most cases the supplied items per product and ap- 
parel size lie in the range between 1 and 6. 

The task can be formulated informally as follows: Given 
historic supply and sales data for a commodity group, find 
out some robust information on the demand distribution 
over branches in that commodity group that can be used to 
optimize or at least to improve the supply distribution over 
all branches. 

We remark that trading fashion has the special feature 
that also the demand for different apparel size varies over 
the branches. In this article, however, we focus on the as- 
pect of improving the supply distribution over all branches. 
The apparel size distribution problem is subject some other 
research in progress. 

1.1 Related work 

Demand forecasting for NOS items is an well-studied 
topic both in research and practice. The literature is over- 
boarding, see, e.g., lU |2] O for some surveys. For pro- 
motional items and other items with single, very short life 
cycles, however, we did not find any suitable demand fore- 
casting methods. 

The literature in revenue management (assortment op- 
timization, inventory control, dynamic pricing) very often 
assumes the neglectability of out-of-stock substitution ef- 
fects. This out-of-stock substitution in the sales data of our 



partner, however, poses the biggest problem in our case. In 
our real-world application we have no replenishment, small 
volume deliveries per branch, lost sales with unknown or 
even no substitution, sales rates depending much more on 
the success of the individual product at the time it was of- 
fered than on the size. Therefore, estimating the absolute 
future demand distribution from historical sales data with 
no correction for out-of-stock substitution seems question- 
able. 

Most demand forecasting tools used in practice are pro- 
vided by specialized software companies. Quite a lot of 
software packages are available, see (|6l for an overview. 
Our partner firm has checked several offers in the past and 
did - apart from the NOS segment — not find any optimiza- 
tion tools tailored to their needs. 



distribution of the branches directly from historic sales data 
is shown to be inappropriate on our given set of sales data 
in Section |3] We propose our new Top-Dog-Index in Sec- 
tion [4] We analyze its statistical robustness and its distinc- 
tive character in clustering branches according to the devi- 
ation of the historic ratio between supply and demand. In 
Section [5] we describe an heuristic iterative procedure that 
uses the information from the Top-Dog-Indices to alter the 
supply distribution towards a suitable distribution that more 
or less matches the demand distribution over branches. An 
outlook and a conclusion will be given in SectionIS] 

2 The real-world problem and an abstract 
problem formulation 



1.2 Our contribution 

We show that a reasonably-looking attempt to measure 
the demand distribution over all branches by measuring for 
each branch the sales over all products up to a certain day 
(to avoid out-of-stock substitution) does not work because 
of the high volatility in the sales rates of different products. 

The key idea of this work is that estimating something 
weaker than the absolute fraction of total demand of a 
branch will result in stronger information that is still suf- 
ficient to improve on the demand consistency of the supply 
of branches. 

More specifically, we propose the new Top-Dog-Index 
(TDI) that can measure the branch dependent deviation of 
demand from supply, even for very small sales amounts or 
short selling periods. This yields, in particular, an estimate 
for the direction in which the supply was different from de- 
mand in the past for each branch. 

On the one hand, the TDI is a rather coarse measurement; 
on the other hand, we can show that on our real-world data 
set it is statistically robust in the sense that the TDIs of the 
branches relative to each other are surprisingly similar on 
several independent samples from the sales data and their 
complements. 

To show the value of the information provided by the 
TDI, we propose a dynamic optimization procedure that 
shifts relative supply among branches until the deviation 
measured is as small as possible. 

Of course, the impact of such an optimizaton procedure 
has to be evaluated in practice. This is subject of future 
research. 

1.3 Outline of the paper 

In Section I2] we state the real-world problem we are in- 
terested in. Moreover, we give an abstract problem formu- 
lation. An obvious approach of determining the demand 



Our industry partner is a fashion discounter with over 
1 000 branches. Products can not be replenished and the 
number of sold items per product and branch is rather small. 
There are no historic sales data for a specific product avail- 
able since every product is sold only for one selling period. 
The challenge for our industry partner is to determine a 
suitable total amount of items of a specific product which 
should be bought. For this part the knowledge and expe- 
rience of the buyers employed by a fashion discounter is 
used. We seriously doubt that a software package based 
on historic sales data can do better. But there is another 
task being more accessible for computer aided forecasting 
methods. Once the total amount of sellable items of a spe- 
cific product is determined, one has to decide how to dis- 
tribute this total amount to a set of branches B which differ 
in their demand. The remaining part of this paper addresses 
the latter task. 

In the following, we formulate this problem in a more 
abstract way. Given a set of branches B, a set of prod- 
ucts P, a function S{b, p) which denotes the historic supply 
of product p for each branch b, and historic sales transac- 
tions from which one can determine how many items of a 
given product p are sold in a given branch 6 at a given day 
of sales d. The target is to estimate a demand 77(6, p) for a 
future product p ^ P in a given branch h, where we can use 
TlibeB "ni^Tp) = 1 as normalization. This estimation ri{b,p) 
should be useable as a good advice for a supply S{b, p). No 
further information, e.g., on a stochastic model for the pur- 
chaser behavior, is available. 

3 Some real-data analysis evaluating an obvi- 
ous approach 

The most obvious approach to determine a demand dis- 
tribution over branches is to count the sold items per branch 
and divide by the total number of sold items. Here we have 
some freedom to choose the day of the sale where we mea- 



sure these magnitudes. We have to balance two competing 
influences. An early measurement may provide numbers of 
sale which are statistically too small for a good estimate. 
On the other hand on a late day of sales there might be too 
much unsatisfied demand to estimate the demand since no 
replenishment is possible in our application. 

The business strategy of our partner implies to cut prices 
until all items are sold. So, a very late measurement would 
only estimate the supply instead of the demand. As there 
is no expert knowledge to decide which is the optimal day 
of sales to measure the sales and estimate the branch de- 
pendent demand distribution we have adapted a statistical 
test to measure the significance of the demand distributions 
obtained for each possible day of counting the sold items. 
Given a data set D, a day of sales d let (pb.diD) be the es- 
timated demand for branch b determined using the amounts 
of sold items up to day d as described above. 

We normalize the values (pb.diD) so that we have 
J2 ^b,d{D) = 1 for each day of sales d, where B is the set 
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of branches. A common statistical method to analyze the 
reliability of a prediction based on some data universe D is 
to randomly partition D into two nearly equally sized dis- 
joint samples Di and D2 with D1UD2 — D and to com- 
pare the prediction based on Di with the prediction based 
on D2- If the two predictions differ substantially than the 
used prediction method is obviously not very trustworthy or 
statistically speaking not very robust. 

In the following part of this section we analyze the ro- 
bustness of the prediction (j)i,^d{D) for every possible sales 
day, meaning that even an optimal sales day for the mea- 
surement does not provide a prediction being good enough 
for our purpose. To measure exactly by how much two pre- 
dictions (f).^d{Di) and 4>..d{D2) differ we introduce the fol- 
lowing: 

Definition 1 For a given sales day d and two samples Di 
and D2 we define the discrepancy Sd as 



5d{Di,D2) :- Y. l^bADi) - 0b,dp2)| 



(1) 
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Similarly we define a discrepancy between supply and 
demand. We compare both discrepancies in Figure [T] The 
result: there is no measuring day for which the discrepancy 
between two samples is smaller than the discrepancy be- 
tween a sample and the supply. In other words, if we con- 
sider the discrepancy between supply and demand as a mea- 
sure for the inconsistency of the supply distribution with 
the demand distribution, then either the supply is not sig- 
nificantly inconsistent with demand (i.e., we should better 
change nothing) or the measurements on the various sam- 
ples are significantly different (i.e., nothing can be learned 
about how to correct the supply distribution). 




Figure 1. Discrepancy for the first 60 days. 



An explanation why this obvious approach does not 
work well in our case is due to the small sale numbers and 
the interference of the demand of a branch with product at- 
tractivity and price cutting strategies. In Figurel2]we depict 
the change of prediction 0b,d(D) over time for five charac- 
teristic but arbitrary branches 




Figure 2. Prediction 0b,d(D) over time. 

We would like to remark that one of the authors cur- 
rently advises two diploma theses which check some com- 
mon parametric models for demand forecasting on historic 
sales data from literature. None of them gives significant in- 
formation of the demand distribution over branches of our 
data set because the data does not exhibit any similarity to 
the parametric distributions coming from economic theory 
and the like. This may be due to the fact that the contaminat- 
ing effects of promotion, mark-downs, openings/closings of 
competing stores prohibit a causal model for the demand. 
We do not claim that the assumptions of parametric demand 
models never hold, but in our application they are most cer- 
tainly not met. 

4 The Top-Dog-Index (TDI) 

In the previous section we learned that in our applica- 
tion we cannot utilize the most obvious approach of look- 
ing at the sales distribution over the different branches on 



an arbitrary but fixed day of the selling period of each in- 
dividual product. Since there is also no indication that any 
of the common parametric models for the demand estima- 
tion directly from sales data fit in our application we make 
no assumptions on a specific stochastic distribution of the 
purchaser behavior 

Our new idea dismisses the desire to estimate an absolute 
percental demand distribution for the branches. Instead we 
develop an index measuring the relative success of a branch 
in the competition of all branches that can be estimated from 
historic sales data in a stable way. 

To motivate our distribution free measurement we con- 
sider the following thought experiment. For a given branch 
b and given product p let 6'f,(p) denote the stock-out-day. 
Let us assume that we have 9b{p) — db'{p) for all products 
p and all pairs of branches 6, b'. In this situation one could 
certainly say that the branch-dependent demand is perfectly 
matched by the supply. Our measure tries to quantify the 
variation of the described ideal situation. 

Therefore, we sort for each product p the stock-out-days 
6b{p) in increasing order. If for a fixed product p a branch b 
is among the best third according to this list it gets a winning 
point for p. If it is among the last third it is assigned a losing 
point for p. With Bp being the set of branches which trade 
product p and P being the set of the products traded by the 
company we can define more precisely: 

Definition 2 Let b be a branch. The Top-Dog-Count is de- 
fined as W{b) :— 



peP 



1 



Bp\>\{b' ^Bp\eb'{p)<eb{p)}\ 



(2) 



and the Flop-Dog-Count is defined as L(b) 
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\\Bp\>\{b' ^Bp\eb.{p) 


>Ob{p)}\] 








(3) 


For a fix dampening parameter C > let 
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og-Index (TDI) of branch b. 







If the TDI of a branch b is significantly large compared 
to the TDIs of the other branches then we claim that branch 
b was undersupplied in the past. Similarly, if the TDI of 
branch 6 is significantly small compared to the TDIs of the 
other branches then we claim that branch b was oversup- 
plied in the past. We give an heuristic optimization proce- 
dure past on this information in the section. The effect of 
the dampening parameter C is on the one hand that the TDI 
is well defined since division by zero is circumvented. On 
the other hand, and more important, the influence of small 
Top-Dog- or Flop-Dog-Counts, which are statistically un- 
stable, is leveled to a decreased importance. 



4.1 Statistical significance of the TDI 

Similarly as in Section [5] we want to analyze the signif- 
icance of the proposed Top-Dog-Index on some real sales 
data. Instead of two data sets Di and D2 we use seven such 
samples Di. Therefore we assign to each different product 
p E P a equi-distributed random number rp G {1, 2, 3, 4}. 
The samples Di are composed as summarized in Table [T] 



Di:=\ 


p\eP 


^pe{l,2}. 


\ 


D2.^\ 


[p\eP 


rpe{3,4} 


\ 


^3:=i 


p\eP 


rpe{l,3} 


\ 


Di:=\ 


p\eP 


'^pe{2,4}] 


\ 


D5:=\ 


p\eP 


rp e {3}} 


D,:=\ 


p\eP 


'^pG {1,2,4}} 


Dj:^\ 


[p\eP 


rpG {1,2,3,4}} 



Table 1 . Assignment of test sets. 

For the interpretation we remark that the pairs {Di , D2), 
{D^^D^), and {D^^Dq) are complementary. The whole 
data population is denoted by Di and equals P. We use 
TDI{b, Di) as an abbreviation of TDI{b) where P is re- 
placed by Di. 

Since the Top-Dog-Index is designed as a non- 
quantitative index we have to use another statistical test to 
assure ourselves that it gives some significant information. 
We find it convincing to regard the Top-Dog-Index as sig- 
nificant and robust whenever we have 



TDI{b,D,) _ TDI{b\D,) 
TDI{b,Dj) "^ TDI{b',Dj) 



(5) 



for each pair of branches 6, b' and each pair of samples 
Di , Dj . In words we claim that the Top-Dog-Index is a rel- 
ative index which is independent of the underlying sample 
if we consider a fixed universe Dt. 
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Figure 3. Relative distribution of the Top- 
Dog-Index on different data samples and 
branches. 



Our first aim is to provide evidence that the TDI{b) val- 
ues are robust measurements. There is a nice way to look 
at equation ^ graphically. For each branch b let us plot a 
column of the the relative values y^^^L'^ j^ ■. for all i. The 

result for our data set is plotted in Figure [3] 

To get the correct picture in the interpretation of the plot 
of FigurelSlwe compare it to the extreme cases of determin- 



istic numbers (i.e. 



TDI(b.Di) _ _ 

TDI(b,D,) 



c for all i and j), see 



Figure [4] and random numbers, see Figure l5] 




measurement.) So there is empirical evidence that the TDI 
gives some stable information. As a comparison of the TDI 
and the method described in Section |3] we depict the corre- 
sponding relativ distribution for measuring day 5 in Figure 
l6] Although a measurement on this day was the best we 
could find, it still produces more severe outliers than the 
TDI measurement. 




Figure 6. Relative distribution of 



Figure 4. Relative distribution of determinis- 
tic numbers. 



As a matter of fact, the regions of same color in the plot 
of the relative distribution of deterministic numbers in Fig- 
ure|4]are formed by perfect rectangles, which are not forced 
in general to have equal height. 




Figure 5. Relative distribution of in [0.5,1.5] 
equi-distributed random variables. 



As an example for a random plot we depict in Figure 
l5] the relative distribution of random numbers being equi- 
distributed in the interval [0.5, 1.5]. 

In the plots of Figure [3] |4j and|5]we can see that that the 
TDI on the given data set behaves more like a perfect de- 
terministic estimation than a random number distribution. 
(Ideally, one should now quantify how large the probabil- 
ity is to obtain a TDI chart as in Figure [3] by a random 



Now the question remains whether this information is 
enough to cluster branches into oversupplied and undersup- 
plied ones. More directly: is the distinctive character of the 
TDI strong enough? We consider this question in the next 
subsection. How the TDI information can be used to iter- 
atively improve the branch dependent ratio between supply 
and demand will be the topic of Section [5] 

4.2 The distinctive character of the TDI 

If one forces the values of the TDIs to be contained in 
an interval of small length, then clearly a plot of the rela- 
tive distributions would look like the plot of Figure |4] As 
an thought experiment just imagine how Figure l5] would 
look like, if we would use random numbers being equi- 
distributed in the interval [0.9, 1.1] instead of being equi- 
distributed in the interval [0.5, 1.5] 

Forcing the possible values of the TDIs in an interval 
of small length is feasible by choosing a sufficiently large 
dampening parameter C. So this parameter has to be cho- 
sen with care. We remind ourselves that we would like to 
use the TDIs to cluster branches. Therefore the TDIs should 
vary over a not to small range of values to have a good dis- 
tinctive character Clearly by using the TDI we can only 
detect possible improvements if the supply versus demand 
ratio actually inadequate in a certain level. In Figure IT] we 
have plotted the occurring TDIs of our data set to demon- 
strate there is indeed some variation of values in our data 
set, no matter which sample we consider (let alone the data 
universe). As one can see the TDIs vary widely enough 
to distinguish between historically under- and oversupplied 
branches. 




the intervals X, and increment numbers Aj. That is for sev- 
eral reasons. On the one hand that is exactly the point where 
some expert form the business should calibrate the param- 
eters to specific data of the company. One the other hand 
there are quite a lot of possibilities how to do it in detail. 
Their analysis will be a topic of future research. For the 
practical application we account rather simple than sophis- 
ticated variants in the first step. 

6 Conclusion and outlook 



Figure 7. Occurring TDIs. 

5 The heuristic supply optimization proce- 
dure based on the TDI 

So far we have developed and statistically stable index 
capturing the deviation of supply from demand for each 
branch. Now we have to specify how we can use this in- 
formation to improve the branch dependent ratio between 
supply and demand. 

Let S{b) be the historic supply of branch b being normal- 
ized so that we have ^ S{b) = 1. Our aim is to estimate 

beB 

supplies S{b), also fullfilling ^ S{b) = 1, which are more 

bGB 

appropriate concerning the satisfaction of demand by using 
the TDI information. 

Therefore let us partition the interval (0, oo) of the posi- 
tive real numbers into a given number of I appropriate cho- 
sen intervals Xj. Further we need I appropriately chosen 
increment numbers A j . Our proposed update formula for 
the estimated branch dependent demand is given by 



S{b) 



S{b) + A 



Jib) 



Eb'eBSib') 



■A 



j(fc') 



(6) 



for all branches b, where j{b) is the unique index with 

TDl{b)eljii,). 

We do not claim that the S{b) are a good estimation for 
the demand of all branches. Our claim is that they approach 
a good estimation of the branch dependent demand if one it- 
erates the described procedure over several rounds and care- 
fully chooses the increment numbers Aj, which may vary 
over the time. 

Once you have a new proposal S{b) of the relative supply 
for each branch b, one only has to fit it into an integer val- 
ued supply for each new product p' . Given the problem of 
apparel size assortment and pre-packing, this is easier said 
than done and is subject of further studies. 

In contrast to the other sections here we are somewhat 
imprecise and there is a lot of freedom, e.g., how to choose 



We have introduced the new Top-Dog-Index which is 
capable to cluster branches of a retail company into over- 
supplied and undersupplied branches at a statistically robust 
niveau level where more direct methods fail. The robustnest 
of this method is documented by some statistical tests based 
on real-world data. 

We have also documented that the distinctive character 
of the proposed TDI is significant for our application: for 
the first time we can gain information about the demand dis- 
tribution of branches from historic sales data on only few 
products with volatile success in sales rates and with un- 
known stock-out substitution effects, and this information 
does not depend too much on the sample of the sales data 
universe out of which the TDI is computed. 

For the dynamic optimization of the supply distribution 
among branches, some fine tuning of parameters is needed; 
for a real-world implementation these details have to be 
fixed. This, together with a field study of the impacts of 
an improved supply distribution are research in progress. 
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